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ON Abstract. When drawing graphs whose edges and nodes contain text 

or graphics, such information needs to be displayed without overlaps, 
either as part of the initial layout or as a post-processing step. The core 
problem in removing overlaps lies in retaining the structural information 
^ inherent in a layout, minimizing the additional area required, and keeping 

O edges as straight as possible. This paper presents a unified node and edge 

j /^ overlap removal algorithm that does well at solving this problem. 

CO 

i — i 1 Introduction 

*ZmJ Most existing graph layout algorithms for undirected graph treat nodes as points 

(--j and edges as straight lines. In practice, both nodes and edges often contain labels 

or graphics that need to be displayed. Naively incorporating these can lead to 
i labels that overlap (see, e.g., Figure [TJ, causing the information of some labels to 

occlude that of others. It is therefore important to remove such overlaps either 
by taking into account label sizes during layout or as a post processing step. 
Assuming that the original layout captures significant structural information 
such as clusters, the goal of any layout that avoids overlaps should be to retain 
\Q the "shape" of the layout based on point nodes. 

The simplest is to scale up the drawing [31] while preserving label size until 
* the labels no longer overlap. This has the advantage of preserving the shape of 

i-H the layout, but is often impractical due to inconveniently large drawings. Label 

ON overlap removal is typically a trade-off between preserving the shape, limiting 

the area, and keep edges straight, with scaling at one extreme. 

> 

1.1 Related work: node label overlap removal 

H 

C3 Many techniques to avoid label overlaps have been devised. One approach is to 

make the node size part of the model of the layout algorithm. For hierarchical 
layouts, the node size can be naturally incorporated into the algorithm [11,34]. 
For symmetric layouts, various authors [5, 17, 29, 36] have extended the spring- 
electrical model [9, 12] to take into account node sizes, usually as increased re- 
pulsive forces. Node overlap removal can also be built into the stress model [27] 
by specifying the ideal edge length to avoid overlaps along the graph edges. Such 
heuristics, however, do not always remove all of the node overlaps, and have to 
be augmented with a post-processing step. 
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An alternative approach is to remove node overlaps as a post-processing 
step after the graph is layed-out. Here the trade-off between layout size and 
preserving the graph's shape is much more explicit. A number of algorithms 
have been proposed. For example, the Voronoi cluster busting algorithm [15,30] 
works by iteratively forming a Voronoi diagram from the current layout and 
moving each node to the center of its Voronoi cell until no overlaps remain. 
The idea is that restricting each node to its corresponding Voronoi cell should 
preserve the relative positions of the nodes. In practice, because of the number 
of iterations often required and the use of a rectangular bounding box (the 
latter might be improved by using something like a-shapes [10] to provide a 
more accurate boundary), the nodes in the final drawing can be homogeneously 
distributed, bearing little resemblance to the original layout. 

Another group of post-processing algorithms is based on maintaining the 
orthogonal ordering [32] of the initial layout as a way to preserve its shape. A 
force scan algorithm and variants were proposed [18,21,29,32] based on these 
constraints. Marriott et al. [8,31] presented a quadratic programming algorithm 
which removes node overlaps while minimizing node displacement and keeping 
the orthogonal ordering. Each of these algorithms has a separate pass in both 
the horizontal and vertical directions. In practice, this asymmetry often results 
in a layout with a distorted aspect ratio [13]. 

While preserving orthogonal ordering is certainly important, it is more gen- 
eral to preserve relative proximity relations [20]. Gansner and Hu [13] proposed 
a node overlap removal algorithm that used a proximity graph as a scaffolding to 
maintain the proximity relations, and employed a sparse stress model to remove 
node overlaps. The algorithm was demonstrated to be scalable for large graphs. 
It produces layout that closely resembles the original layout, but is free of node 
overlaps, and takes comparatively little additional area. 

1.2 Related work: edge label overlap removal or placement 

Edge label overlap removal or placement has also been studied by many authors. 
One of the often studied problems, though not the focus of this paper, is that 
of edge label placement (ELP), where the geometry of the graph, other than 
the labels, is assumed to be fixed. The problem is finding the best placement 
positions for the edge labels to minimize label overlaps, as well as making the 
association between edges and their corresponding labels unambiguous. This 
is highly related to the map labeling problem [4,25,33]. Kakoulis and Tollis 
[23,24,26] proved that the ELP problem is NP hard, and presented an algorithm 
for labeling the edges of hierarchical drawings. The algorithm was also used for 
graph drawings of other styles [6,7]. 

A number of algorithms were proposed in drawing graphs without edge la- 
bel overlaps. Castell et al. [3] presented a procedure for drawing state charts, 
with node and edge labels. Edge label overlaps are avoided as part of the layout 
process by truncating long label strings and allocating enough space vertically 
between layers. Binucci et al. [1] present MILP models to compute optimal draw- 
ings without edge label overlaps, and an exact algorithm and several heuristics 



to compute such drawings with minimum area. Klau and Mutzel [28] proposed 
a branch and cut algorithm which computes optimally labeled orthogonal draw- 
ings. 

In this paper we study the problem of visualizing undirected graphs with both 
node and edge labels. We assume that an aesthetic, symmetric layout based on 
point nodes is available. The task is to adjust the node position so that node 
labels and edge labels can be placed without overlaps. In doing so, we want 
to retain the "shape" of the layout. To minimize the additional area required, 
we allow edges to be bent, but strive to draw them as straight as possible. We 
present (Section [3]) a unified overlap removal algorithm based on a proximity 
graph of the position of node and edge labels in the original layout . Using this 
graph as a guide, it iteratively moves the labels to remove overlaps, while keeping 
the relative positions between them as close to those in the original layout as 
possible, and edges as straight as possible. The algorithm is similar to the stress 
model [27] used for graph layout, except that the stress function involves only a 
sparse selection of all possible node pairs. It is an extension to the node overlap 
removal algorithm of Gansner and Hu [13] to deal with both node and edge labels. 
We evaluate our algorithm on two graphs from applications. Finally, Section [4] 
presents a summary and topics for further study. 

2 Background 

We use G — (V, E) to denote an undirected graph, with V the set of nodes 
(vertices) and E edges. We use |V| and \E\ for the number of vertices and edges, 
respectively. We let Xi represent the current coordinates of vertex i in Euclidean 
space. 

The aim of graph drawing is to find Xj for all i e V so that the resulting 
drawing gives a good visual representation of the information in the graph. 
Two popular methods, the spring-electrical model [9,12], and the stress model 
[27] , both convert the problem of finding an optimal layout to that of finding a 
minimal energy configuration of a physical system. We describe here the stress 
model in more detail, as we shall use a similar model for the purpose of label 
overlap removal in Section [3] 

The stress model assumes that there are springs connecting all nodes of 
the graph, with the ideal spring length equal to the graph theoretical distance 
between nodes. The energy of this spring system is 



where d§ is the graph theoretical distance between vertices i and j, and toy 
is a weight factor, typically 1/djj 2 . The layout that minimizes the above stress 
energy is an optimal layout of the graph. A robust technique to find an optimal 
of this model is stress majorization, where the cost function ([lj is bounded by a 
series of quadratic functions from above, and the process of finding an optimum 
becomes that of solving a series of linear systems [14] . 
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In the stress model, the graph theoretical distance between all pairs of ver- 
tices has to be calculated, leading to quadratic complexity in the number of 
vertices. There have been attempts (e.g., [2, 14]) to simplify the stress function 
by considering only a sparse portion of the graph. Our experience, however, is 
that these techniques can fail to yield good layouts on real-life graphs. Therefore, 
algorithms based on a spring-electrical model employing a multilevel approach 
and an efficient approximation scheme for long range repulsive forces [16, 19,35] 
are still the most efficient choices when laying out large graphs without consid- 
eration of the node or edge labels. It is worth pointing out that the label overlap 
removal algorithm proposed in this paper works with any symmetric layout, 
regardless of how it is generated. 

3 A Unified Model for Label Overlap Removal 

Our goal is to remove label overlaps while preserving the shape of the initial 
layout by maintaining the proximity relations [20] among the nodes. Recently 
we proposed an efficient algorithm PRISM [13] to remove node overlaps based 
on the proximity stress model. We first summarize this model, then describe how 
to extend it to solve the more general problem of overlap removal for both node 
and edge labels. 

3.1 The PRISM algorithm for node overlap removal 

We first set up a rigid "scaffolding" structure so that while vertices can move 
around, their relative positions are maintained. This scaffolding is constructed 
using an approximation to proximity graph [22], the Delaunay triangulation 
(DT). 

We then check every edge in DT and see if there are any node overlap along 
that edge. Let Wi and Hi denote the half width and height of the node i, and 
x?(l) and x°(2) the current X and Y coordinates of this node. If i and j form 
an edge in the DT, we calculate the overlap factor of these two nodes, 

= max ( min Uai-?(ur «l) ' *) ■ (2) 

For nodes that do not overlap, ty = 1. For nodes that do overlap, such overlaps 
can be removed if we expand the edge by this factor. Therefore we want to 
generate a layout such that an edge in the proximity graph has the ideal edge 
length close to iij||x° — x"||. In other words, we want to minimize the following 
stress function 

^2 Wij{\\vi-Xj\\-<k) 2 (3) 

(i.j)eEp 

Here = sy||x° — x°|| is the ideal distance for the edge {i,j}, sy is a scaling 
factor related to the overlap factor ty (see fit)), toy = 1/df: is a scaling factor, 



and Ep is the set of edges of the proximity graph. We call ([3| the proximity 
stress model. 

Because DT is a planar graph, which has no more than 3|V| — 3 edges, the 
above stress function has no more than 3|V| — 3 terms. Furthermore, because 
DT is rigid, it provides a good scaffolding that constrains the relative position 
of the vertices and helps to preserve the global structure of the original layout. 

It is important that we do not attempt to remove overlaps in one iteration by 
minimizing the above model with sy = iy to avoid high localized stress causing 
deviation to the original layout, hence we damp the overlap factor by setting 

sy = min(£ij,s max ) (4) 

and try to remove overlaps a little at a time. Here s max > 1 is a number limiting 
the amount of overlap we are allowed to remove in one iteration. Typically 

^rnax — 1.5. 

After minimizing we arrive at a layout that may still have node overlaps. 
We then regenerate the proximity graph using DT and calculate the overlap 
factor along the edges of this graph, and redo the minimization. This forms an 
iterative process that ends when there are no more overlaps along the edges 
of the proximity graph. At this stage there may still be node overlaps between 
nodes that do not constitute an edge of the DT, so a scan-line algorithm [8] is 
used to find all overlaps, and the proximity graph is augmented with additional 
edges, where each edge consists of a pair of nodes that overlap. We then re-solve 
pj. This process is repeated until the scan-line algorithm finds no more overlaps. 
We called the algorithm PRISM (PRoxImity Stress Model) . 

Figure [l] shows the result of using PRISM on graph representing the routes 
of Olympic torch relay from summer 1936 to summer 2008. This graph is taken 
from the GD'08 contest. Nodes represent countries, and for a country that has 
hosted the Olympics, the year (e.g., "S2008" means summer 2008) that the 
Olympics was hosted is also included in the node label. The edges are color 
codes, represented different years and seasons the relays took place. Warm colors 
are used for summer games and cool color winter games. Edge labels are used 
to highlight the year that edge was traversed. In the figure, node overlaps are 
removed using PRISM. However, due to the presence of edge labels, there are a 
lot of overlaps among edge labels and between node and edge labels. As a result, 
it is very difficult to follow the relay routes, even with the help of the coloring 
scheme. This and other applications call for an effective algorithm for removal 
node as well as edge label overlaps. 



3.2 The penalized proximity stress model for node and edge label 
overlap removal 

As Figure [l] demonstrates, when there are both node and edge labels, simply 
removing node overlaps is not sufficient. Instead, we need an algorithm to take 
into account the overlaps between both node and edge labels. 




Fig. 1. The Olympic torch relay graph with node overlaps removed, but before 
edge label overlaps are removed. 



The first algorithm we propose is a simple one. For each edge {i, j} that has 
a label k, we break the edge down into two edges, {i, k} and {fc, j}, with the 
addition of a new node k. We call this expanded graph G a — {V a , E a }. If we 
denote V e the set of additional nodes derived from edge labels, then node set 
of the expanded graph is a union of V e and the nodes in the original graph, or 

v a = vuv e . 

A simple algorithm for visualizing graphs with both node and edge labels 
is to apply PRISM to a layout of this expanded graph to arrive at an overlap- 
free drawing. We call this algorithm vPFJSM. Figure [2] gives the drawing of the 
Olympic torch relay graph using vPPJSM. As can be seen, vPPJSM is able to 
avoid label overlaps and allows us to follow the routes of the torch relay quite 
well. However, the shortcoming of this approach is also evident. Because we are 
treating node labels and edge labels the same way, some of the edges have very 
sharp bends. For example, at the far left corner, the bend along the edge between 
"LUX" and "BEL" , with an edge label "S1948" , is very sharp. While this may 
be alleviated to some extent by the use of spline edges, it would be good to avoid 
these bends in the first place. 

We propose the following ePRISM algorithm. First, we lay out the graph 
without considering overlap removal. We can either lay out the expanded graph 
G a , resulting in a position for all node and edge labels, or we can lay out the 
original graph G, and place the edge labels at the center of the edges. We then 




Fig. 2. The Olympic torch relay graph after node and edge label overlaps are 
removed using vPRISM. 



form a proximity graph through Delaunay triangulation using the position of 
both node and edge labels, and calculate the overlap factors along the edges of 
DT as in ([2]). At this point, if we solve the proximity stress model ([3|, we will 
simply get the vPRISM algorithm. Instead, we want to make sure that edges of 
the original graph are as straight as possible. Suppose A: is a vertex represent the 
edge label on edge {i, j} € E, we want Xk to be along the line between Xj and 
Xj, preferably near the center of that line. Therefore we add to (|3| a penalty 
term \\xk — (x{ + Xj)/2\\ 2 , leading to a penalized proximity stress model 



E 

keV„,{i,k}eE a 



w k 



(5) 



to be minimized. Here Ep is the edge set of the proximity graph of the expanded 
graph G a , scalar p is a penalty parameter, scaling factors Wij = l/(dij) 2 are 
applied to the stress term, and scaling factors Wk — l/\\xi — Xj\\ 2 , where Xj and 
Xj are the two nodes correspond to edge label k, are applied to the penalty term. 

The penalized proximity stress model ^ can be solved using the stress ma- 
jorization technique [14], with a modification to account for the penalty term. 
This technique bounds the stress term by a sequence of quadratic functions from 
the above. Finding the minimum of the quadratic functions can be formulated as 



the solution of sparse linear systems relating to the Laplacian of the proximity 
graph. In the case of our model, the linear system to be solved repeatedly is 



(L w + pL e ) x = L Wjd x° (6) 

where x° is the current layout, and x is the new layout that improves the stress 
and penalty terms. The weighted Laplacian matrix L w has elements 



The matrix L e comes from penalty term and is defined as 



w i} i = j, ieV e 

-0.5vii, i G V e , {i, j} G E a 

-O.Swj, J e 14, {i,j} G E a 

0.25w fe , {i, k} G E a , {j, k} G E a , fee V e 



with N(i) the set of neighboring nodes of i in the expanded graph G a , and 
matrix L w .d has elements 



T,{i,i}£E P w ii \\Vi -Vlhi = 3 
. -w« dii/\\yi - yj\\ , i^j 

The addition of matrix L e to L w makes the overall matrix on the left hand 
side of ([5]) somewhat denser, by introducing at most 6\E\ extra entries if every 
edge has an edge label. However, as long as the original graph is sparse, ([6| 
is still a sparse system, and can be solved using conjugate gradient algorithm 
efficiently. Algorithm [T] gives a detailed description of the ePRISM algorithm. 



Algorithm 1 ePRISM: an algorithm for node and edge label removal 

Input: coordinates for each node and edge labels, xl, and bounding box width and 

height {W h Hi}, i = l,2,...,|I4|. 

repeat 

Form a proximity graph Gp of x° by Delaunay triangulation. 

Find the overlap factors (J2J along all edges in Gp. 

Solve the penalized proximity stress model |H| for x. Set x° — x. 
until (convergence) 
repeat 

Form a proximity graph Gp of x° by Delaunay triangulation. 

Find all node overlaps using a scan-line algorithm. Augment Gp with edges from 

node pairs that overlap. 

Find the overlap factor (J2J along all edges of Gp. 
Solve the proximity stress model (JsJ) for x. Set x° — x. 
until (convergence) 

Remove any remaining overlaps using PRISM. 



Unlike PRISM, where the only objective is to remove overlaps, in ePRISM, we 
have the dual objectives of removing overlaps as well as keeping edges straight. 
Therefore, while in PRISM the lack of overlaps is the measure of convergence, 
here, in the two main loops of the ePRISM algorithm, we define convergence 
as \\x — xo||/||xo|| < e, with e = 0.005, and limit the number of iterations to 
1000. Because of the dual objectives, it is possible that after the two main loops 
have converged, some overlaps may still exist, for example among edge labels 
on multiple edges between the same pair of nodes. Therefore we use PRISM 
algorithm to remove any overlaps that may still remain. We set the penalty 
parameter p to 4. 




Fig. 3. The Olympic torch relay graph after node and edge label overlaps are 
removed using ePRISM. 

The ePRISM algorithm has a similar computational complexity to PRISM, 
when the latter is applied to the expanded graph G a . However we found that 
ePRISM tends to take more iterations to converge in practice, due to the diffi- 
culty in resolving the conflicting objective of removing label overlaps and keeping 
edges straight. 

Figure [3] gives the result of ePRISM on the Olympic torch relay graph. 
Clearly, the majority of edges are now straight, and those edges that are not 
straight have milder bends. Some of these bends are unavoidable, such as when 
there are multiple edges between the same pair of nodes. In general, compared 
with Figure |2) Figure |] is more pleasing to look at and easier to follow due to 
the great reduction of edge bends. 



Fig. 4. Visualization of countries sharing same city names. Two countries sharing 
more than 20 city names are linked by an edge, with the number of shared names, 
as well as two such names, shown as edge label. Node and edge label overlaps 
are removed using ePRISM. 

Finally, we apply ePRISM to another application. Here, we visualize coun- 
tries and how they relate. Two countries are related if there are more than 20 
cities with the same name that exist in both countries. From the drawing, we 
can see that United States is the country with the most connections to other 
countries, reflecting the fact that most people in the country emigrated from 
another country in the last three centuries, and that cities in the United States 
are often named by the immigrants using names of cities in their home countries. 
The United States and the United Kingdom have the greatest number of cities 
with the same name (327), followed by Canada (250), Germany (86), and Italy 
(74). The United States also shares city names with a large group of countries 
in South America, seen on the right hand side of the figure. Canada, on the 
other hand, has the United States, the United Kingdom and France as its three 
closest connected countries, reflecting the historical tie of Canada to these three 
countries. 

4 Conclusions and Future Work 

The main contribution of this paper is a new algorithm for removing both node 
and edge overlaps, based on a penalized proximity stress model. The algorithm is 
shown to produce layouts displaying both node and edge labels without overlaps, 
and is aesthetic, with mostly straight edges. 



For future work, we would like to investigate a combination of this algorithm 
and edge label placement (ELP) techniques, so that those edge labels that can be 
placed within the current geometry without introducing ambiguity or overlaps 
are dealt with separately to those that have to be treated using the algorithm 
proposed in this paper. 
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